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1.  Introduction 


As  any  area  matures,  there  Is  the  need  to  understand  Its  components  and  their  rela¬ 
tionships.  An  experimental  process  provides  a  basis  for  the  needed  advancement  In 
knowledge  and  understanding.  Since  software  engineering  Is  In  Its  adolescence,  it  is  cer¬ 
tainly  a  candidate  for  the  experimental  method  of  analysis.  Experimentation  is  per¬ 
formed  In  order  to  help  us  better  evaluate,  predict,  understand,  control,  and  improve 
the  software  development  process  and  product. 

Experimentation  In  software  engineering,  as  with  any  other  experimental  procedure, 
Involves  an  Iteration  of  a  hypothesize  and  test  process.  Models  of  the  software  process 
or  product  are  built,  hypotheses  about  these  models  are  tested,  and  the  Information 
learned  Is  used  to  refine  the  old  hypotheses  or  develop  new  ones.  In  an  area  like  soft¬ 
ware  engineering,  this  approach  takes  on  special  Importance  because  we  greatly  need  to 
Improve  our  knowledge  of  how  software  Is  developed,  the  effect  of  various  technologies, 
and  what  areas  most  need  Improvement.  There  Is  a  great  deal  to  be  learned  and  Intui¬ 
tion  Is  not  always  the  best  teacher. 

In  this  paper  we  lay  out  a  framework  for  analyzing  most  of  the  experimental  work 
that  has  been  performed  In  software  engineering  over  the  past  several  years.  We  then 
discuss  a  variety  of  these  experiments,  their  results,  and  the  Impact  they  have  had  on 
our  knowledge  of  the  software  engineering  discipline. 

2.  Objectives 

There  are  three  overall  goals  for  this  work.  The  first  objective  Is  to  describe  a 
framework  for  experimentation  In  software  engineering.  The  framework  for  experlmen- 
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tatlon  Is  Intended  to  help  structure  the  experimental  process  and  to  provide  a 


classification  scheme  for  understanding  and  evaluating  experimental  studies.  The 
second  objective  Is  to  classify  and  discuss  a  variety  of  experiments  from  the  literature 
according  to  the  framework.  The  description  of  several  software  engineering  studies  Is 
Intended  to  provide  an  overview  of  the  knowledge  resulting  from  experimental  work,  a 
summary  of  current  research  directions,  and  a  basis  for  learning  from  past  experience 
with  experimentation.  The  third  objective  Is  to  Identify  problem  areas  and  lessons 
learned  In  experimentation  In  software  engineering.  The  presentation  of  problem  areas 
and  lessons  learned  Is  Intended  to  focus  attention  on  general  trends  In  the  field  and  to 
provide  the  experimenter  with  useful  recommendations  for  performing  future  studies. 
The  following  three  sections  address  these  goals. 

3.  Experimentation  Framework 

The  framework  of  experimentation,  summarized  in  Figure  l,  consists  of  four 
categories  corresponding  to  phases  of  the  experimentation  process:  I)  definition,  II)  plan¬ 
ning,  HI)  operation,  and  IV)  Interpretation.  The  following  sections  discuss  each  of  these 
four  phases. 

3.1.  Experiment  Definition 

The  first  phase  of  the  experimental  process  Is  the  study  definition  phase.  The 
study  definition  phase  contains  six  parts:  A)  motivation,  B)  object,  C)  purpose,  D)  per¬ 
spective,  E)  domain,  and  F)  scope.  Most  study  definitions  contain  each  of  the  six  parts; 
an  example  definition  appears  In  Figure  2. 
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There  can  be  several  motivations,  objects,  purposes,  or  perspectives  In  an  experi¬ 
mental  study.  For  example,  the  motivation  of  a  study  may  be  to  understand,  assess,  or 
Improve  the  effect  of  a  certain  technology.  The  “object  of  study"  Is  the  primary  entity 
examined  In  a  study.  A  study  may  examine  the  ffnal  software  product,  a  development 
process  (e.g..  Inspection  process,  change  process),  a  model  (e.g.,  software  reliability 
model),  etc.  The  purpose  of  a  study  may  be  to  characterize  the  change  In  a  system  over 
time,  to  evaluate  the  effectiveness  of  testing  processes,  to  predict  system  development 
cost  by  using  a  cost  model,  to  motivate1  the  validity  of  a  theory  by  analyzing  empirical 
evidence,  etc.  In  experimental  studies  that  examine  “software  quality,"  the  Interpreta¬ 
tion  usually  Includes  correctness  If  It  Is  from  the  perspective  of  a  developer  or  reliability 
If  It  Is  from  the  perspective  of  a  customer.  Studies  that  examine  metrics  for  a  given  pro¬ 
ject  type  from  the  perspective  of  the  project  manager  may  Interest  certain  project 
managers,  while  corporate  managers  may  only  be  interested  If  the  metrics  apply  across 
several  project  types. 

Two  Important  domains  that  are  considered  In  experimental  studies  of  software  are 
l)  the  Individual  programmers  or  programming  teams  (the  "teams")  and  11)  the  programs 
or  projects  (the  "projects").  “Teams"  are  (possibly  single- person)  groups  that  work 
separately,  and  "projects"  are  separate  programs  or  problems  on  which  teams  work. 
Teams  may  be  characterized  by  experience,  size,  organization,  etc.,  and  projects  may  be 
characterized  by  size,  complexity,  application,  etc.  A  general  classification  of  the  scopes 

1  For  clarification,  the  usage  of  the  word  “motivate"  as  a  study  purpose  Is  distinct 
from  the  study  “motivation." 
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of  experimental  studies  can  be  obtained  by  examining  the  sizes  of  these  two  domains 
considered  (see  Figure  3).  Blocked  subject-project  studies  examine  one  or  more  objects 
across  a  set  of  teams  and  a  set  of  projects.  Replicated  project  studies  examine  object(s) 
across  a  set  of  teams  and  a  single  project,  while  multi-project  variation  studies  examine 
object(s)  across  a  single  team  and  a  set  of  projects.  Single  project  studies  examine 
object(s)  on  a  single  team  and  a  single  project.  As  the  representativeness  of  the  samples 
examined  and  the  scope  of  examination  Increase,  the  wider-reaching  a  study’s  conclu¬ 
sions  become. 

3.2.  Experiment  Planning 

The  second  phase  of  the  experimental  process  Is  the  study  planning  phase.  The  fol¬ 
lowing  sections  discuss  aspects  of  the  experiment  planning  phase:  A)  design,  B)  criteria, 
and  C)  measurement. 

The  design  of  an  experiment  couples  the  study  scope  with  analytical  methods  and 
Indicates  the  domain  samples  to  be  examined.  Fractional  factorial  or  randomized  block 
designs  usually  apply  In  blocked  subject-project  studies,  while  completely  randomized  or 
Incomplete  block  designs  usually  apply  In  multi-project  and  replicated  project  studies 
(33,  40).  Multivariate  analysis  methods,  Including  correlation,  factor  analysis,  and  re¬ 
gression  [75,_80,  88j,  generally  may  be  used  across  all  experimental  scopes.  Statistical 
models  may  be  formulated  and  customized  as  appropriate  [80].  Non-parametrlc 
methods  should  be  planned  when  only  limited  data  may  be  available  or  distributional 
assumptions  may  not  be  met  [90].  Sampling  techniques  [4lJ  may  be  used  to  select 
representative  programmers  and  programs/ projects  to  examine. 


4 


Different  motivations,  objects,  purposes,  perspectives,  domains,  and  scopes  require 


the  examination  of  different  criteria.  Criteria  that  tend  to  be  direct  reflections  of 
cost/quallty  Include  cost  (ill,  100,  80,  4,  28),  errors/changes  [49,  14,  109,  2,  81,  19],  reli¬ 
ability,  (42,  04,  50.  70,  09,  70,  77,  95],  and  correctness  [51  01,  08).  Criteria  that  tend  to 
be  Indirect  reflections  of  cost/quallty  Include  data  coupling  [02,  48,  102,  78],  information 
visibility  [85,  83,  55],  programmer  understanding  [98,  100,  107,  1 10],  execution  coverage 
[103,  21,  24],  and  slze/complexlty  [17,  59,  71]. 

The  concrete  manifestations  of  the  cost/quallty  aspects  examined  in  the  experiment 
are  captured  through  measurement.  Paradigms  assist  In  the  metric  definition  process: 
the  goal-questlon-metrlc  paradigm  [20,  22,  25,  93]  and  the  factor-crlterla-metrlc  para¬ 
digm  [39,  72] .  Once  appropriate  metrics  have  been  defined,  they  may  be  validated  to 
show  that  they  capture  what  Is  Intended  [12,  18,  44,  50,  100,  113].  The  data  collection 
process  Includes  developing  automated  collection  schemes  [15]  and  designing  and  testing 
data  collection  forms  [22,  10].  The  required  data  may  Include  both  objective  and  sub¬ 
jective  data  and  dlfferents  levels  of  measurement:  nominal  (or  classlflcatory),  ordinal  (or 
ranking).  Interval,  or  ratio  [99]. 

3.3.  Experiment  Operation 

The  third  phase  of  the  experimental  process  Is  the  study  operation  phase.  The 
operation  of  the  experiment  consists  of  A)  preparation,  B)  execution,  and  C)  analysis. 
Before  conducting  the  actual  experiment,  preparation  may  Include  a  pilot  study  to 
confirm  the  experimental  scenario,  help  organize  experimental  factors  (e.g.,  subject  ex¬ 
pertise),  or  Inoculate  the  subjects  (44,  43,  03,  24,  110,  73].  Experimenters  collect  and 
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validate  the  defined  data  during  the  execution  of  the  study  (18,  109).  The  analysis  of 
the  data  may  Include  a  combination  of  quantitative  and  qualitative  methods  [30],  The 
preliminary  screening  of  the  data,  probably  using  plots  and  histograms,  usually  proceeds 
the  formal  data  analysis.  The  process  of  analyzing  the  data  requires  the  Investigation  of 
any  underlying  assumptions  (e.g„  distributional)  before  the  application  of  the  statistical 
models  and  tests. 

3.4.  Experiment  Interpretation 

The  fourth  phase  of  the  experimental  process  Is  the  study  Interpretation  phase. 
The  Interpretation  of  the  experiment  consists  of  A)  interpretation  context,  B)  extrapola¬ 
tion,  and  C)  Impact.  The  results  of  the  data  analysis  from  a  study  are  Interpreted  In  a 
broadening  series  of  contexts.  These  contexts  of  Interpretation  are  the  statistical  frame¬ 
work  In  which  the  result  Is  derived,  the  purpose  of  the  particular  study,  and  the 
knowledge  In  the  field  of  research  [15].  The  representativeness  of  the  sampling  analyzed 
In  a  study  qualifies  the  extrapolation  of  the  results  to  other  environments  [20],  Several 
follow-up  activities  contribute  to  the  Impact  of  a  study:  presentlng/publlshlng  the 
results  for  feedback,  replicating  the  experiment  [33,  40],  and  actually  applying  the 
results  by  modifying  methods  for  software  development,  maintenance,  management,  and 
research. 

4.  Classification  of  Analyses 

Several  Investigators  have  published  studies  In  the  four  general  scopes  of  examina¬ 
tion:  blocked  subject-project,  replicated  project,  multi-project  variation,  or  single  pro¬ 
ject.  The  following  sections  cite  studies  from  each  of  these  categories.  Note  that  sur- 
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veys  on  experimental  methodology  In  empirical  studies  Include  [35,  96,  74].  Each  of  the 
sections  first  discusses  one  experiment  In  moderate  depth,  using  Italicized  keywords  from 
the  framework  for  experimentation,  and  then  chronologically  presents  an  overview  of 
several  others  In  the  category. 

4.1.  Blocked  Subject-Project  Studies 

With  a  motivation  to  Improve  and  better  understand  unit  testing,  [24]  conducted  a 
study  whose  purpose  was  to  characterize  and  evaluate  the  processes  (l.e.,  objects)  of  code 
reading,  functional  testing,  and  structural  testing  from  the  perspective  of  the  developer. 
The  testing  processes  were  examined  In  a  blocked  subject-project  scope,  where  74  stu¬ 
dent  through  professional  programmers  (from  the  programmer  domain)  tested  four  unit- 
size  programs  (from  the  program  domain)  In  a  replicated  fractional  factorial  design.  Ob¬ 
jective  measurement  of  the  testing  processes  was  In  several  criteria  areas:  fault  detection 
effectiveness,  fault  detection  cost,  and  classes  of  faults  detected.  Experiment  prepara¬ 
tion  included  a  pilot  study  [63],  execution  Incorporated  both  manual  and  automated 
monitoring  of  testing  activity,  and  analysis  used  analysis  of  variance  methods  [33,  90]. 
The  major  results  (In  the  interpretation  context  of  the  study  purpose)  included  1)  with 
the  professionals,  code  reading  detected  more  software  faults  and  had  a  higher  fault 
detection  rate  than  did  the  other. methods;  2)  with  the  professionals,  functional  testing 
detected  more  faults  than  did  structural  testing,  but  they  were  not  different  in  fault 
detection  rate;  3)  with  the  students,  the  three  techniques  were  not  different  In  perfor¬ 
mance,  except  that  structural  testing  detected  fewer  faults  than  did  the  others  In  one 
study  phase;  and  4)  overall,  code  reading  detected  more  Interface  faults  and  functional 
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testing  detected  more  control  faults  than  did  the  other  methods.  A  major  result  (in  the 
interpretation  context  of  the  field  of  research)  Is  that  the  study  suggests  that  non- 
execution  based  fault  detection,  as  In  code  reading,  Is  at  least  as  effective  as  on-line 
methods.  The  particular  programmers  and  programs  sampled  qualify  the  extrapolation 
of  the  results.  The  impact  of  the  study  Is  an  advancement  In  the  understanding  of 
effective  software  testing  methods. 

In  order  to  understand  program  debugging,  [57]  evaluated  several  related  factors, 
Including  effect  of  debugging  aids,  effect  of  fault  type,  and  effect  of  particular  program 
debugged  from  the  perspective  of  the  developer  and  malntalner.  Thirty  experienced 
programmers  Independently  debugged  one  of  four  one-page  programs  that  contained  a 
single  fault  from  one  of  three  classes.  The  major  results  of  these  studies  were  1)  debug¬ 
ging  Is  much  faster  If  the  programmer  has  had  previous  experience  with  the  program.  2) 
assignment  bugs  were  harder  to  And  than  other  kinds,  and  3)  debugging  aids  did  not 
seem  to  help  programmers  debug  faster.  Consistent  results  were  obtained  when  the 
study  was  conducted  on  ten  additional  experienced  programmers  [58].  These  results  and 
the  Identification  of  possible  "principles"  of  debugging  contribute  to  the  understanding 
of  debugging  methodology. 

In  order  to  improve  experimental  methodology  and  Its  application,  [l  10]  evaluated 
programmers’  ability  to  understand  and  modify  a  program  from  the  perspective  of  the 
developer  and  modifier.  Various  measures  of  programmer  understanding  were  calculat¬ 
ed,  In  a  series  of  factorial  design  experiments,  on  groups  of  16  -  48  university  students 
performing  tasks  on  two  small  programs.  The  study  emphasized  the  need  for  well- 
structured  and  well-documeuted  programs,  and  provided  valuable  testimony  on  and 
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worked  toward  a  suitable  experimentation  methodology. 


In  order  to  assess  the  Impact  of  language  features  on  the  programming  process,  [53] 
characterized  the  relationship  of  language  features  to  software  reliability  from  the  per¬ 
spective  of  the  developer.  Based  on  an  analysis  of  the  deficiencies  In  a  programming 
language,  nine  different  features  were  modified  to  produce  a  new  version.  Firty-one  ad¬ 
vanced  students  were  divided  into  two  groups  and  asked  to  complete  Implementations  of 
two  small  but  sophisticated  programs  (75-200  line)  In  the  original  language  and  Its 
modified  version.  The  redesigned  features  In  the  two  languages  were  contrasted  in  pro¬ 
gram  fault  frequency,  type,  and  persistence.  The  experiment  Identified  several 
language-design  decisions  that  significantly  affected  reliability,  which  contributes  to  the 
understanding  of  language  design  for  reliable  software. 

In  order  to  understand  the  unit  testing  process  better,  [80]  evaluated  a  reading 
technique  and  functional  and  "selective"  testing  (a  composite  approach)  from  the  per¬ 
spective  of  the  developer.  Thirty-nine  university  students  applied  the  techniques  to 
three  unit-size  programs  in  a  Latin  square  design.  Functional  and  “selective”  testing 
were  equally  effective  and  both  superior  to  the  reading  technique,  which  contributed  to 
our  understanding  of  testing  methodology. 

In  order  to  Improve  and  better  understand  the  maintenance  process,  [43]  conducted 

\ 

two  experiments  to  evaluate  factors  that  Influence  two  aspects  of  software  maintenance, 
program  understanding  and  modification,  from  the  perspective  of  the  developer  and 
maintained  Thirty-six  Junior  through  advanced  professional  programmers  In  each  ex¬ 
periment  examined  three  classes  of  small  (38  -  57  source  line)  programs  In  a  factorial 
design.  The  factors  examined  Include  control  flow  complexity,  variable  name  mnemonl- 
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city,  type  of  modification,  degree  of  commenting,  and  the  relationship  of  programmer 


performance  to  various  complexity  metrics.  In  [44]  they  continued  the  Investigation  of 
how  software  characteristics  relate  to  psychological  complexity,  and  presented  a  third 
experiment  to  evaluate  the  ability  of  54  professional  programmers  to  detect  program 
bugs  In  three  programs  in  a  factorial  design.  The  series  of  experiments  showed  that 
software  science  [59]  and  cyclomatlc  complexity  [71]  measures  are  related  to  the 
difficulty  experienced  by  programmers  In  locating  errors  In  code. 

In  order  to  Improve  and  better  understand  program  debugging,  [108]  evaluated  the 
theory  that  "programmers  use  ’slicing'  (stripping  away  a  program's  statements  that  do 
not  Influence  a  given  variable  at  a  given  statement)  when  debugging"  from  the  perspec¬ 
tive  of  the  developer,  malntalner,  and  researcher.  Twenty-one  university  graduate  stu¬ 
dents  and  programming  staff  debugged  a  fault  In  three  unit-size  (75  -  150  source  line) 
programs  In  a  non-parametrlc  design.  The  study  results  supported  the  slicing  theory, 
that  Is,  programmers  during  debugging  routinely  partitioned  programs  into  a  coherent, 
discontiguous  piece  (or  slice).  The  results  advance  the  understanding  of  software  debug¬ 
ging  methodology. 

In  order  to  improve  design  techniques,  [87]  evaluated  flowcharts  and  program 
design  languages  (PDL)  from  the  perspective  of  the  developer.  Twenty-two  graduate 
students  designed  two  small  (approximately  1000  source  line)  projects,  one  using 
flowcharts  and  the  other  using  PDL.  Overall,  the  results  suggested  that  design  perfor¬ 
mance  and  designer-programmer  communication  were  better  for  projects  using  PDL. 
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In  order  to  validate  a  theory  of  programming  knowledge,  [101]  conducted  two  stu¬ 
dies,  using  139  novices  and  41  professional  programmers,  to  evaluate  programmer 
behavior  from  the  perspective  of  the  researcher.  The  theory  was  that  programming 
knowledge  contained  programming  plans  (generic  program  fragments  representing  com¬ 
mon  actions  sequences)  and  rules  of  programming  discourse  (conventions  used  In  com¬ 
posing  plans  Into  programs).  The  results  support  the  existence  and  use  of  such  plans 
and  rules  by  both  novice  and  advanced  programmers. 

Other  blocked  subject-project  studies  include  [82,  112). 

4.2.  Replicated  Project  Studies 

With  a  motivation  to  assess  and  better  understand  team  software  development 
methodologies,  [15]  conducted  a  study  whose  purpose  was  to  characterize  and  evaluate 
the  development  processes  (l.e.,  objects)  of  a  a)  dlsclpllned-methodology  team  approach, 
b)  ad  hoc  team  approach,  and  c)  ad  hoc  Individual  approach  from  the  perspective  of  the 
developer  and  project  manager.  The  development  processes  were  examined  In  a  repli¬ 
cated  project  scope.  In  which  advanced  university  students  comprising  seven  three- 
person  teams,  six  three-person  teams,  and  six  Individuals  (from  the  programmer  domain ) 
used  the  approaches,  respectively.  They  separately  developed  a  small  (600  -  2200  line) 
compiler  (rrom  the  program  domain)  in  a  non-parametrlc  design.  Objective  measure¬ 
ment  of  the  development  approaches  was  In  several  criteria  areas:  number  of  changes, 
number  of  program  runs,  program  data  usage,  program  data  coupilng/blndlng,  static 
program  slze/complexlty  metrics,  language  usage,  and  modularity.  Experiment  prepara¬ 
tion  included  presentation  of  relevant  material  [88,  7,  34],  execution  Included  automated 


11 


monitoring  of  on-line  development  activity  and  analysis  used  non-parametrlc  comparison 
methods.  The  major  results  (In  the  interpretation  context  of  the  study  purpose)  Includ¬ 
ed  1)  the  methodological  discipline  was  a  key  Influence  on  the  general  efficiency  of  the 
software  development  process;  2)  the  disciplined  team  methodology  significantly  reduced 
the  costs  of  software  development  as  reflected  In  program  runs  and  changes;  and  3)  the 
examination  of  the  effect  of  the  development  approaches  was  accomplished  by  the  use  of 
quantitative,  objective,  unobtrusive,  and  automatable  process  and  product  metrics.  A 
major  result  (In  the  interpretation  context  of  the  field  of  research)  is  that  the  study  sup¬ 
ports  the  belief  that  Incorporating  discipline  In  software  development  reflects  positively 
on  both  the  development  process  and  Anal  product.  The  particular  programmers  and 
program  sampled  qualify  the  extrapolation  of  the  results.  The  impact  of  the  study  Is  an 
advancement  In  the  understanding  of  software  development  methodologies  and  their 
evaluation. 

In  order  to  improve  the  design  and  Implementation  processes,  [84]  evaluated  system 
modularity  from  the  perspective  of  the  developer.  Twenty  university  undergraduates 
each  developed  one  of  four  different  types  of  Implementations  for  one  of  five  different 
small  modules.  Then  each  of  the  modules  were  combined  with  others  to  form  several 
versions  of  the  whole  system.  The  major  results  suggested  that  minor  effort  was  re¬ 
quired  In  assembling  the  systems  and  that  major  system  changes  can  be  confined  to 
small,  well-defined  subsystems.  The  results  support  the  Ideas  on  formal  specifications 
and  modularity  discussed  In  [83,  85]  and  advance  the  understanding  of  design  methodol¬ 
ogy. 
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In  order  to  assess  the  Impact  of  static  typing  of  programming  languages  In  the  de¬ 
velopment  process,  [54)  evaluated  the  use  of  a  statically  typed  language  (having  Integers 
and  strings)  and  a  “typeless"  language  (e.g.,  arbitrary  subscripting  of  memory)  from  the 
perspective  of  the  developer.  Thirty-eight  students  programmed  a  small  (48  -  297 
source  line)  problem  In  both  languages,  with  half  doing  It  In  each  order.  The  two 
languages  were  compared  In  the  resulting  program  faults,  the  number  of  runs  containing 
faults,  and  the  relation  of  subject  experience  to  fault  proneness.  The  major  result  was 
that  the  use  of  a  statically  typed  language  can  Increase  programming  reliability,  which 
assists  In  the  design  and  use  of  programming  languages. 

In  order  to  Improve  program  composition,  comprehension,  debugging,  and 
modification.  [98]  evaluated  the  use  of  detailed  flowcharts  In  these  tasks  from  the  per¬ 
spective  of  the  developer,  maintained  modifier,  and  researcher.  Groups  of  53  -  70  no¬ 
vice  through  Intermediate  subjects,  In  a  series  of  five  experiments,  performed  various 
tasks  using  small  programs.  No  significant  differences  were  found  between  groups  that 
used  and  those  that  did  not  use  flowcharts,  questioning  the  merit  of  using  detailed 
flowcharts. 

In  order  to  Improve  and  better  understand  the  unit  testing  process,  [79]  evaluated 

the  techniques  of  three-person  walk-throughs,  functional  testing,  and  a  control  group 

\ 

from  the  perspective  of  the  developer.  Fifty-nine  Junior  through  advanced  professional 
programmers  applied  the  techniques  to  test  a  small  (100  source  line)  but  nontrivial  pro¬ 
gram.  The  techniques  were  not  different  In  the  number  of  faults  they  detected,  all  pair¬ 
ings  of  techniques  were  superior  to  single  techniques,  and  code  reviews  were  less  cost- 
effective  than  the  others.  These  results  assist  In  the  selection  of  appropriate  software 
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testing  techniques. 


In  order  to  validate  a  particular  metric  family,  [17]  evaluated  the  ability  of  a  pro¬ 
posed  metric  family  to  explain  differences  In  system  development  methodologies  and  sys¬ 
tem  changes  from  the  perspective  of  the  developer,  project  manager,  and  researcher. 
The  metrics  were  applied  to  19  versions  of  a  small  (000  -  2200)  compiler,  which  were 
developed  by  teams  of  advanced  university  students  using  three  different  development 
approaches  (see  the  first  study  [15]  described  In  this  section).  The  major  results  includ¬ 
ed  1)  the  metrics  were  able  to  differentiate  among  projects  developed  with  different  de¬ 
velopment  methodologies;  and  2)  the  differences  among  Individuals  had  a  large  effect  on 
the  relationships  between  the  metrics  and  aspects  of  system  development.  These  results 
suggest  Insights  Into  the  formulation  and  appropriate  use  of  software  metrics. 

In  order  to  Improve  the  understanding  of  why  software  errors  occur,  [05]  character¬ 
ized  programmer  misconceptions,  cognitive  strategies,  and  their  manifestations  as  bugs 
In  programs  from  the  perspective  of  the  developer  and  researcher.  Two-hundred-four 
novice  programmers  separately  attempted  implementations  of  an  elementary  program. 
The  results  supported  the  programmers’  Intended  use  of  "programming  plans'*  [100]  and 
revealed  that  most  people  preferred  a  read-process  strategy  over  a  process-read  strategy. 
The  results  advance  the  understanding  of  how  Individuals  write  programs,  why  they 
sometimes  make  errors,  and  what  programming  language  constructs  should  be  available. 

In  order  to  understand  the  effect  of  coding  conventions  on  program  comprehensibili¬ 
ty,  [73]  conducted  a  study  to  evaluate  the  relationship  between  Indentation  levels  and 
program  comprehension  from  the  perspective  of  the  developer.  Elghty-slx  novice 
through  professional  subjects  answered  questions  about  one  of  seven  program  variations 
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with  different  level  and  type  of  Indentation.  The  major  result  was  that  an  Indentation 
level  of  two  or  four  spaces  was  preferred  over  zero  or  six. 

In  order  to  Improve  software  development  approaches,  [29]  characterized  and 
evaluated  the  prototyping  and  specifying  development  approaches  from  the  perspective 
of  the  developer,  project  manager,  and  user.  Seven  two-  and  three-person  teams,  con¬ 
sisting  of  university  graduate  students,  developed  versions  of  the  same  application  soft¬ 
ware  system  (2000  -  4000  line);  four  teams  used  a  requirement/design  specifying  ap¬ 
proach  and  three  teams  used  a  prototyping  approach.  The  systems  developed  by  proto¬ 
typing  were  smaller,  required  less  development  effort,  and  were  easier  to  use.  The  sys¬ 
tems  developed  by  specifying  had  more  coherent  designs,  more  complete  functionality, 
and  software  that  was  easier  to  Integrate.  These  results  contribute  to  the  understanding 
of  the  merits  and  appropriateness  of  software  development  approaches. 

In  order  to  validate  the  theoretical  model  for  N-verslon  programming  [66],  [67,  3] 
conducted  a  study  to  evaluate  the  effectiveness  of  N-verslon  programming  for  reliability 
from  the  perspective  of  the  customer  and  user.  N-verslon  programming  uses  a  high-level 
driver  to  connect  several  separately  designed  versions  of  the  same  system,  the  systems 
"vote"  on  the  correct  solution,  and  the  solution  provided  by  the  majority  of  the  systems 

Is  output.  Twenty-seven  graduate  students  were  asked  to  Independently  design  an  800 

\ 

source  line  system.  The  factors  examined  Included  Individual  system  reliability,  total 
N-verslon  system  reliability,  and  classes  of  faults  that  occurred  In  systems  simultaneous¬ 
ly.  The  major  result  was  that  the  assumption  of  Independence  of  the  faults  In  programs 
Is  not  Justified,  and  therefore,  the  reliability  of  the  combined  "voting"  system  may  not 
be  as  high  as  given  by  the  model. 
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In  order  to  Improve  and  better  understand  software  development  approaches,  [94] 
characterized  and  evaluated  the  Cleanroom  development  approach  [47,  46],  In  which 
software  Is  developed  without  execution  (l.e.,  completely  off-line),  from  the  perspective 
of  the  developer,  project  manager,  and  customer.  Fifteen  three-person  teams  of  ad¬ 
vanced  university  students  separately  developed  a  small  system  (300  -  2300  source  line); 
ten  teams  used  Cleanroom  and  five  teams  used  a  traditional  development  approach  In  a 
non-parametrlc  design.  The  major  results  Included  1)  most  developers  using  the  Clean¬ 
room  approach  were  able  to  build  systems  without  program  execution;  and  2)  the  Clean¬ 
room  teams'  products  met  system  requirements  more  completely  and  succeeded  on  more 
operational  test  cases  than  did  those  developed  with  a  traditional  approach.  The  results 
suggest  the  feasibility  of  complete  off-line  development,  as  In  Cleanroom,  and  advance 
the  understanding  of  software  development  methodology. 

Other  replicated  project  studies  Include  [37,  5,  63]. 

4.3.  Multi-Project  Variation  Studies 

With  a  motivation  to  Improve  the  understanding  of  resource  usage  during  software 
development,  [4]  conducted  a  study  whose  purpose  was  to  predict  development  cost  by 
using  a  particular  model  (l.e.,  object)  and  to  evaluate  It  from  the  perspective  of  the  pro¬ 
ject  manager,  corporate  manager  and  researcher.  The  particular  model  generation 
method  was  examined  In  a  multi-project  scope,  with  baseline  data  from  18  large  (2500  - 
100,000  source  line)  software  projects  in  the  NASA  S.E.L.  production  environment  (from 
the  program  domain),  in  which  teams  contained  from  two  to  ten  programmers  (from  the 
programmer  domain)  [10,  11,  38,  81],  The  study  design  Incorporated  multivariate 
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methods  to  parameterize  the  model.  Objective  and  subjective  measurement  of  the  pro¬ 
jects  was  based  on  21  criteria1  In  three  areas:  methodology,  complexity,  and  personnel 
experience.  Study  preparation  Included  preliminary  work  [52],  execution  Included  an  es¬ 
tablished  set  of  data  collection  forms  [10],  and  analysis  used  forward  multtvariate  regres¬ 
sion  methods.  The  major  results  (In  the  interpretation  context  of  the  study  purpose)  In¬ 
cluded  1)  the  estimation  of  software  development  resource  usage  Improved  by  consider¬ 
ing  a  set  of  both  base-line  and  customization  factors;  2)  the  application  in  the  NASA 
environment  of  the  proposed  model  generation  method,  which  considers  both  types  of 
factors,  produced  a  resource  usage  estimate  for  a  future  project  within  one  standard  de¬ 
viation  of  the  actual;  and  3)  the  confirmation  of  the  NASA  S.E.L.  formula  that  the  cost 
per  line  of  reusing  code  Is  20%  of  that  of  developing  new  code.  A  major  result  (In  the 
interpretation  context  of  the  field  of  research)  Is  that  the  study  highlights  the  difference 
of  each  software  development  environment,  which  Influences  the  use  of  resource  estima¬ 
tion  models.  The  particular  programming  environment  and  projects  sampled  qualify  the 
extrapolation  of  the  results.  The  impact  of  the  study  Is  an  advancement  In  the  under¬ 
standing  of  estimating  software  development  resource  expenditure. 

In  order  to  assess,  manage,  and  Improve  multi-project  environments,  [28.  28,  106, 

13,  38,  18,  82,  100,  07,  105]  characterized,  evaluated,  and/or  predicted  the  effect  of 

\ 

several  factors  from  the  perspective  of  the  developer,  modifier,  project  manager,  and 
corporate  manager.  All  the  studies  examined  moderate  to  large  projects  from  produc- 

3  Twenty-one  factors  were  selected  after  examining  a  total  of  82  factors  that  possi¬ 
bly  contributed  to  project  resource  expenditure,  Including  38  from  [108]  and  16  from 
[28]. 
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tlon  environments.  The  relationships  Investigated  were  among  various  factors,  Including 
structured  programming,  personnel  background,  development  process  and  product  con¬ 
straints,  project  complexity,  human  and  computer  resource  consumption,  error-prone 
software  Identification,  error/change  distributions,  data  coupllng/blndlng,  project  dura¬ 
tion,  staff  size,  degree  of  management  control,  and  productivity.  These  studies  have 
provided  Increased  project  visibility,  greater  understanding  of  classes  of  factors  sensitive 
to  project  performance,  awareness  of  the  need  for  project  measurement,  and  efforts  for 
standardization  of  definitions.  Analysis  has  begun  on  Incorporating  project  variation  In¬ 
formation  into  a  management  tool  (18,  23]. 

In  order  to  Improve  and  better  understand  the  software  maintenance  process,  [104] 
conducted  an  experiment  to  evaluate  the  relationship  between  the  rate  of  maintenance 
repair  and  various  product  and  process  metrics  from  the  perspective  of  the  developer, 
user,  and  the  project  manager.  A  total  of  447  small  (up  to  600  statements)  commercial 
and  clerical  Cobol  programs  from  one  Australian  organization  and  two  U.S.  organiza¬ 
tions  were  analyzed.  The  product  and  process  metrics  Included  program  complexity, 
programming  style,  programmer  quality,  and  number  of  system  releases.  The  major 
results  were  1)  In  the  Australian  organization,  program  complexity  and  programming 
style  significantly  affected  the  maintenance  repair  rate;  and  2)  In  the  U.S.  organizations, 
the  number  of  times  a  system  was  released  significantly  affected  the  maintenance  repair 
rate. 

In  order  to  Improve  the  software  maintenance  process,  [l]  evaluated  operational 
faults  from  the  perspective  of  the  user,  customer,  project  manager,  and  corporate 
manager.  The  fault  history  for  nine  large  production  products  (e.g.,  operating  system 
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releases  or  their  major  components)  was  empirically  modeled.  He  developed  an  ap¬ 
proach  for  estimating  whether  and  under  what  circumstances  preventively  Axing  faults 
In  operational  software  In  the  field  was  appropriate.  Preventively  fixing  faults  consists 
of  Installing  fixes  to  faults  that  have  yet  to  be  discovered  by  particular  users,  but  have 
been  discovered  by  the  vendor  or  other  users.  The  major  result  Is  that  for  the  typical 
user,  corrective  service  Is  a  reasonable  way  of  dealing  with  most  faults  after  the  code  has 
been  In  use  for  a  fairly  long  period  of  time,  while  preventively  fixing  hlgh-rate  faults  Is 
advantageous  during  the  time  immediately  following  release. 

In  order  to  assess  the  effectiveness  of  the  testing  process,  [31  [  evaluated  estimations 
of  the  number  of  residual  faults  In  a  system  from  the  perspective  of  the  customer, 
developer,  and  project  manager.  The  study  was  based  on  fault  data  collected  from 
three  large  (2000  -  6000  module)  systems  developed  In  the  Hughes-Fullerton  environ¬ 
ment.  The  study  partitioned  the  faults  based  on  severity  and  analyzed  the  differences  In 
estimates  of  remaining  faults  according  to  stage  of  testing.  Insights  were  gained  Into  re¬ 
lationships  between  fault  detection  rates  and  residual  faults. 

4.4.  Single  Project  Studies 

With  a  motivation  to  Improve  software  development  methodology,  [8]  conducted  a 
study  whose  purpose  was  to  characterize  the  process  (l.e.,  object)  of  Iterative  enhance¬ 
ment  In  conjunction  with  a  top-down,  stepwise  refinement  development  approach  from 
the  perspective  of  the  developer.  The  development  process  was  examined  In  a  single 
project  scope,  where  the  authors,  two  experienced  Individuals  (from  the  programmer 
domain),  built  a  17,000  line  compiler  (from  the  program  domain).  The  study  design  ln- 
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corporate*!  descriptive  methods  to  capture  system  evolution.  Objective  measurement  of 
the  system  was  In  several  criteria  areas:  size,  modularity,  local/global  data  usage,  and 
data  blndlng/coupllng  [82,  102].  Study  preparation  Included  language  design  [9],  execu¬ 
tion  incorporated  static  analysis  of  system  snapshots,  and  analysis  used  descriptive 
statistics.  The  results  (In  the  interpretation  context  of  the  statistical  framework)  Includ¬ 
ed  1)  the  percentage  of  global  variables  decreased  over  time  while  the  percentage  of  ac¬ 
tual  vs.  possible  data  couplings  across  modules  Increased,  suggesting  the  usage  of  global 
data  became  more  appropriate  over  time;  and  2)  the  number  of  procedures  and  func¬ 
tions  rose  over  time  while  the  number  of  statements  per  procedure  or  function  de¬ 
creased,  suggesting  Increased  modularity.  The  major  result  of  the  study  (in  the  in¬ 
terpretation  context  of  the  study  purpose)  was  that  the  iterative  enhancement  technique 
encouraged  the  development  of  a  software  product  that  had  several  generally  desirable 
aspects  of  system  structure.  A  major  result  (In  the  interpretation  context  of  the  field  of 
research)  Is  that  the  study  demonstrates  the  feasibility  of  Iterative  enhancement.  The 
particular  programming  team  and  project  examined  qualify  the  extrapolation  of  the 
results.  The  impact  of  the  study  Is  an  advancement  In  the  understanding  of  software 
development  approaches. 

In  order  to  Improve,  better  understand,  and  manage  the  software  development  pro- 

\ 

cess,  [8]  evaluated  the  effect  of  applying  chief  programming  teams  and  structured  pro¬ 
gramming  In  system  development  from  the  perspective  of  the  user,  developer,  project 
manager,  and  corporate  manager.  The  large  (83,000  line)  system,  known  as  "The  New 
York  Times  Project,"  and  was  developed  by  a  team  of  professionals  organized  as  a  chief 
programmer  team,  using  structured  code,  top  down  design,  walk-throughs,  and  program 
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libraries.  Several  benefits  were  Identified,  Including  reduced  development  time  and  cost, 


reduced  time  In  system  Integration,  and  reduced  fault  detection  In  acceptance  testing 
and  field  use.  The  results  of  the  study  demonstrated  the  feasibility  of  the  chief  pro¬ 
grammer  team  concept  and  the  accompanying  methodologies  In  a  production  environ¬ 
ment. 

In  order  to  Improve  their  development  environments  through  Increased  understand¬ 
ing,  [49,  14,  2,  81,  19]  each  conducted  single  project  studies  to  characterize  the  errors 
and  changes  made  during  a  development  project.  They  examined  the  development  of  a 
moderate  to  large  software  project,  done  by  a  multi-person  team.  In  a  production  en¬ 
vironment.  They  analyzed  the  frequency  and  distribution  of  errors  during  development 
and  their  relationship  with  several  factors.  Including  module  size,  software  complexity, 
developer  experience,  method  of  detection  and  Isolation,  effort  for  Isolation  and  correc¬ 
tion,  phase  of  entrance  Into  the  system  and  observance,  reuse  of  existing  design  and 
code,  and  role  of  the  requirements  document.  Such  analyses  have  produced  fault 
categorization  schemes  and  have  been  useful  In  understanding  and  improving  a  develop¬ 
ment  environment. 

In  order  to  improve  design  methodology,  [55,  27]  examined  a  ground-support  sys¬ 
tem  written  In  Ada3  to  characterize  the  use  of  Ada  packages  from  the  perspective  of  the 

\ 

developer.  Four  professional  programmers  developed  a  project  of  10,000  source  lines  of 
code.  Factors  such  as  how  package  use  affected  the  ease  of  system  modification  and 

3  Ada  Is  a  trademark  of  the  Department  of  Defense. 
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how  to  measure  module  change  resistance  were  Identified,  as  well  as  bow  these  observa¬ 
tions  related  to  aspects  of  the  development  and  training.  The  major  results  were  1) 
several  measures  of  Ada  programs  were  developed,  and  2)  there  was  a  Indication  that  a 
lot  of  training  will  be  necessary  If  we  are  to  expect  the  facilities  of  Ada  to  be  properly 
used. 

In  order  to  assess  and  improve  software  testing  methodology,  [21,  88]  characterized 
and  evaluated  the  relationship  between  system  acceptance  tests  and  operational  usage 
from  the  perspective  of  the  developer,  project  manager,  customer,  and  researcher.  The 
execution  coverage  of  functionally  generated  acceptance  test  cases  and  a  sample  of 
operational  usage  cases  was  monitored  for  a  medium-size  (10,000  line)  software  system 
developed  In  a  production  environment.  The  results  calculated  that  84%  of  the  pro¬ 
gram  statements  were  executed  during  system  operation  and  that  the  acceptance  test 
cases  corresponded  reasonably  well  to  the  operational  usage.  The  results  give  Insights 
Into  the  relationships  among  structural  coverage,  fault  detection,  system  testing,  and 
system  usage. 

5.  Problem  Areas  in  Experimentation 

The  following  sections  Identify  several  problem  areas  of  experimentation  In  software 
engineering*  These  areas  may  ser\e  as  guidelines  In  the  performance  of  future  studies. 
After  mentioning  some  overall  observations,  cautions  in  each  of  the  areas  of  experiment 
definition,  planning,  operation,  and  Interpretation  are  discussed. 
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5.1.  Experimentation  Overall 


There  appears  to  be  no  "universal  model"  or  "silver  bullet"  in  software  engineering. 
There  are  an  enormous  number  of  factors  that  differ  across  environments,  in  terms  of 
desired  cost/quallty  goals,  methodology,  experience,  problem  domain,  constraints,  etc. 
[106,  26,  4,  13,  28].  This  results  In  every  software  development/maintenance/...  environ¬ 
ment  being  different.  Another  area  of  wide  variation  Is  the  many-to-one  differential  in 
human  performance  [17,  45,  24].  The  particular  Individuals  examined  In  an  empirical 
study  can  make  an  enormous  difference.  Among  other  considerations,  these  variations 
suggest  that  metrics  need  to  be  validated  for  a  particular  environment  and  a  particular 
person  to  show  that  they  capture  what  Is  Intended  [17,  18],  Thus,  experimental  studies 
should  consider  the  potentially  vast  differences  among  environments  and  people. 

5.2.  Experiment  Definition 

In  the  definition  of  the  purpose  for  the  experiment,  the  formulation  of  intuitive 
problems  Into  precisely  stated  goals  Is  a  nontrivial  task  [20,  22].  Defining  the  purpose  of 
a  study  often  requires  the  articulation  of  what  Is  meant  by  "software  quality."  The 
many  Interpretations  and  perceptions  of  quality  [32,  30,  72]  highlight  the  need  for  con¬ 
sidering  whose  perspective  of  quality  Is  being  examined.  Thus,  a  precise  specification  of 
the  problem  to  be  Investigated  Is  a  major  step  toward  Its  solution. 

5.3.  Experiment  Planning 

Experimental  planning  should  have  a  horizon  beyond  a  first  experiment.  Con¬ 
trolled  studies  may  be  used  to  focus  on  the  effect  of  certain  factors,  while  their  results 
may  be  confirmed  In  replications  [02,  08,  101,  110,  57,  58,  44,  43,  24]  and/or  larger  case 
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studies  (4,  15].  When  designing  studies,  consider  that  a  combination  of  factors  may  be 
effective  as  a  "critical  mass,"  even  though  the  particular  factors  may  be  Ineffective  when 
treated  In  isolation  [15,  105].  Note  that  formal  designs  and  the  resulting  statistical 
robustness  are  desirable,  but  we  should  not  be  driven  exclusively  by  the  achievement  of 
statistical  significance.  Common  sense  must  be  maintained,  which  allows  us,  for  exam¬ 
ple,  to  experiment  Just  to  help  develop  hypotheses  [19,  109].  Thus,  the  experimental 
planning  process  should  Include  a  series  of  experiments  for  exploration,  verification,  and 
application. 

5.4.  Experiment  Operation 

The  collection  of  the  required  data  constitutes  the  primary  result  of  the  study 
operation  phase.  The  data  must  be  carefully  defined,  validated,  and  communicated  to 
ensure  Its  consistent  Interpretation  by  all  persons  associated  with  the  experiment:  sub¬ 
jects  under  observation,  experimenters,  and  literature  audience  [18].  There  have  been 
papers  In  the  literature  that  do  not  define  their  data  well  enough  to  enable  a  comparison 
of  results  across  many  projects  and  environments.  We  have  often  contacted  the  experi¬ 
menter  to  discover  that  we  are  measuring  different  things.  Thus,  the  experimenter 
should  be  cautious  about  the  definition,  validation,  and  communication  of  data,  since 
they  play  a  fundamental  role  In  the  experimental  process. 

5.5.  Experiment  Interpretation 

The  appropriate  presentation  of  results  from  experiments  contributes  to  their 
correct  Interpretation.  Experimental  results  need  to  be  qualified  by  the  particular  sam¬ 
ples  (e.g.,  programmers,  programs)  analyzed  [20].  The  extrapolation  of  results  from  a 
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particular  sample  must  consider  the  representativeness  of  the  sample  to  other  environ¬ 
ments  [41,  ill,  100,  86,  4,  28].  The  visibility  of  the  experimental  results  In  professional 
forums  and  the  open  literature  provides  valuable  feedback  and  constructive  criticism. 
Thus,  the  presentation  of  experimental  results  should  Include  appropriate  qualification 
and  adequate  exposure  to  support  their  proper  interpretation. 

0.  Conclusion 

Experimentation  In  software  engineering  supports  the  advancement  of  the  fl°’d 
through  an  iterative  learning  process.  The  experimental  process  has  begun  to  be  apt  - 4 
In  a  multiplicity  of  environments  to  study  a  variety  of  software  technology  area..  Fr 
the  studies  presented.  It  Is  clear  that  experimentation  has  p  ven  effective  In  providing 
Insights  and  furthering  our  domain  of  knowledge  about  the  software  process  and  pro¬ 
duct.  In  fact,  there  is  a  learning  process  In  the  experimentation  approach  Itself,  as  has 
been  shown  in  this  paper. 

We  have  described  a  framework  for  experimentation  to  provide  a  structure  for 
presenting  previous  studies.  We  also  recommend  the  framework  as  a  mechanism  to  fa¬ 
cilitate  the  definition,  planning,  operation,  and  Interpretation  of  past  and  future  studies. 
The  problem  areas  discussed  are  meant  to  provide  some  useful  recommendations  for  the 
application  of  the  experimental  process  In  software  engineering.  The  experimental 
framework  cannot  be  used  In  a  vacuum;  the  framework  and  the  lessons  learned  comple¬ 
ment  one  another  and  should  be  used  In  a  synergistic  fashion.  This  work  contributes  to 
the  understanding  and  advancement  of  experimentation  In  software  engineering. 


25 


7.  References 


[I)  E.  N.  Adams,  Optimizing  Preventive  Service  of  Software  Products,  IBM  Journal  of  Research  and 

Development  28,  1,  pp.  2-14,  Jan.  1984. 

[2|  J.-L.  Albin  and  R.  Ferreol,  Codec te  et  analyse  de  mesures  de  logictel  (Collection  and  Analysis  of 

Software  Data),  Technique  et  Scienee  Informatiques  1,  4.  pp.  297-313.  1982.  (Rairo  ISSN 
0752-4072) 

[3]  A.  Avtxienis.  P.  Gunningberg,  J.  P.  J.  Kelly,  L.  Striglni,  P.  J.  Traverse,  K.  S.  Tso,  and  U.  Voges. 

The  UCLA  Dedix  System:  A  Distributed  Testbed  for  Multiple-Version  Software,  Digest  Ft- 
feenth  Int.  Sym.  Fault- Toler  ant  Computing,  Ann  Arbor,  MI,  June  19-21,  1985. 

[4]  J.  W.  Bailey  and  V.  R.  Basil!,  A  Meta-Model  for  Software  Development  Resource  Expenditures. 

Proc.  Fifth  Int.  Conf.  Software  Engr.,  San  Diego,  CA,  pp.  107-118,  1981. 

[5]  J.  W.  Bailey,  Teaching  Ada:  A  Comparison  of  Two  Approaches,  Dept.  Com.  Sci.,  Univ.  Maryland. 

College  Park.  MD.  working  paper,  1984. 

[8]  F.  T.  Baker.  System  Quality  Through  Structured  Programming,  AFIPS  Proc.  19 72  Fall  Joint 
Computer  Conf.  41.  pp.  339-343,  1972. 

[7]  V.  R.  Baslll  and  F.  T.  Baker.  Tutorial  of  Structured  Programming.  Eleventh  IEEE  COMPCON. 

IEEE  Cat.  No.  75CH1049-8,  1975. 

[8]  V.  R.  Baslll  and  A.  J.  Turner,  Iterative  enhancement:  a  practical  technique  for  software  develop¬ 

ment.  IEEE  Transaction a  on  Software  Engineering  SE-1.  4,  Dec.  1975. 

[9]  V.  R.  Baslll  and  A.  J.  Turner,  SIMPL-T:  A  Structured  Programming  Language,  Paladin  House 

Publishers.  Geneva.  IL,  1978. 

[10]  V.  R.  Baslll.  M.  V.  Zelkowits,  F.  E.  McGarry,  R.  W.  Reiter,  Jr.,  W.  F.  TTusxkowskl.  and  D.  L. 

Weiss,  The  Software  Engineering  Laboratory,  Software  Eng.  Lab.,  NASA/Goddard  Space 
Flight  Center,  Greenbelt.  MD,  Rep.  SEL-77-001,  May  1977. 

[II]  V.  R.  Baslll  and  M.  V.  Zelkowits,  Analysing  Medium-Scale  Software  Developments,  Proc.  Third 

Int.  Conf.  Software  Engr.,  Atlanta,  GA.  pp.  118-123,  May  1978. 

[12]  V.  R.  Baslll,  Tutorial  on  Models  and  Metrics  for  Software  Management  and  Engineering,  IEEE 

Computer  Society,  New  York,  1980. 

[13]  V.  R.  BaaUl  and  K.  Freburger.  Programming  Measurement  and  Estimation  in  the  Software  En¬ 

gineering  Laboratory,  Journal  of  Systems  and  Software  2.  pp.  47-67,  1981. 

[14]  V.  R.  Baslll  and  D.  M.  Weiss,  Evaluation  of  a  Software  Requirements  Document  By  Analysis  of 

Change  Data,  Pro e.  Fifth  Int  Conf.  Software  Engr.,  San  Diego,  CA.  pp.  314-323,  March  9-12. 
1981. 

[15]  V.  R.  Baslll  and  R.  W.  Reiter,  A  Controlled  Experiment  Quantitatively  Comparing  Software  De¬ 

velopment  Approaches,  IEEE  Trans.  Software  Engr.  SE-7.  May  1981. 


26 


[32]  T.  P.  Bowen.  G.  B.  Wlgle,  and  J.  T.  Tsai.  Specification  of  Software  Quality  Attributes,  Rome  Air 

Development  Center,  Grifflss  Air  Force  Base.  NY.  Tech.  Rep.  RADC-TR-85-37  (three 
volumes),  Feb.  1085. 

[33]  G.  E.  P.  Box,  W.  G.  Hunter,  and  J.  S.  Hunter,  Statistics  for  Experimenters,  John  Wiley  Sc.  Sons, 

New  York,  1978. 

[34]  F.  P.  Brooks,  Jr.,  The  Mythical  Man-Month,  Addlson-Wesley  Publishing  Co.,  Reading,  MA,  1975. 

[35]  R.  E.  Brooks,  Studying  Programmer  Behavior:  The  Problem  of  Proper  Methodology,  Communt'ea- 

tions  of  the  ACM  23,  4,  pp.  207-213,  1980. 

[36]  W.  D.  Brooks.  Software  Technology  Payoff:  Some  Statistical  Evidence.  J.  Systems  and  Software  2, 

pp.  3-9,  1981. 

[37]  F.  O.  Buck.  Indicators  of  Quality  Inspections,  IBM  Systems  Products  Division,  Kingston.  NY, 

Tech.  Rep  21.802,  Sept.  1981. 

[38]  D.  N.  Card,  F.  E.  McGarry,  J.  Page,  S.  Eslinger,  and  V.  R.  Basil!,  The  Software  Engineering  La¬ 

boratory,  Software  Eng.  Lab.,  NASA/Goddard  Space  Flight  Center,  Greenbelt,  MD  Rep. 
SEL-81-104,  Feb.  1982. 

[39]  J.  P.  Cavano  and  J.  A.  McCall,  A  Framework  for  the  Measurement  of  Software  Quality.  Proe. 

Software  Quality  and  Assurance  Workshop,  San  Diego,  CA.  pp.  133-139,  Nov.  1978. 

[40]  W.  G.  Cochran  and  G.  M.  Cox,  Experimental  Designs,  John  Wiley  Sc.  Sons.  New  York,  1950. 

[41]  W.  G.  Cochran.  Sampling  Techniques,  John  Wiley  &  Sons.  Inc.,  1953. 

[42]  P.  A.  Currit,  M.  Dyer,  and  H.  D.  Mills.  Certifying  the  Reliability  of  Software.  IBM  Corp.,  Federal 

Systems  Division,  5600  Rockledge  Dr.,  Bethesda.  MD.  20817,  Tech.  Rep..  March  1985.  (sub¬ 
mitted  to  the  IEEE  Trans.  Software  Engineering) 

[43]  B.  Curtis,  S.  B.  Sheppard.  P.  MUllman.  M.  A.  Borst,  and  T.  Love.  Measuring  the  Psychological 

Complexity  of  Software  Maintenance  Tasks  with  the  Halstead  and  McCabe  Metrics,  IEEE 
Trans.  Software  Engr.,  pp.  95-104,  March  1979. 

[44]  B.  Curtis,  S.  B.  Sheppard,  and  P.  M.  MllUman,  Third  Time  Charm:  Stronger  Replication  of  the 

Ability  of  Software  Complexity  Metrics  to  Predict  Programmer  Performance,  Proc.  Fourth  Int. 
Conf.  Software  Engr.,  pp.  355-360,  Sept.  1979. 

[45]  B.  Curtis,  Cognitive  Science  of  Programming,  Sixth  Minnowhrook  Workshop  on  Software  Perfor¬ 

mance  Evaluation ,  Blue  Mountain  Lake,  NY,  July  19-22,  1983. 

[46]  M.  Dyer  aad  H.  D.  Mills,  DevelojJlng  Electronic  Systems  with  Certifiable  Reliability,  Proe.  NATO 

Conf.,  Summer,  1982. 

[47]  M.  Dyer,  Cleaaroom  Software  Development  Method,  IBM  Federal  Systems  Division.  Bethesda. 

MD.  October  14,  1982. 

[48]  T.  Emerson,  A  Discriminant  Metric  for  Module  Cohesion,  Proe.  Seventh  Inti.  Conf.  Software 

Engr.,  Orlando.  FL,  pp.  294-303,  1984. 


28 


[18]  V.  R.  Basili  and  C.  Doerflinger,  Monitoring  Software  Development  Through  Dynamic  Variables. 
Proc.  COMPSAC,  Chicago,  IL.  1883. 

[17]  V.  R.  Baslil  and  D.  H.  Hutchens,  An  Empirical  Study  of  a  Syntactic  Metric  Family.  Tram.  Soft¬ 
ware  Engr.  SE-9.  8,  pp.  884-872,  Nov.  1083. 

[18|  V.  R.  Basili,  R.  W.  Selby,  Jr.,  and  T.  Y.  Phillips,  Metric  Analysis  and  Data  Validation  Across 
FORTRAN  Projects,  IEEE  Tram.  Software  Engr.  SE-fl.  8,  pp.  652-863,  Nov.  1983. 

[10]  V.  R.  Basili  and  B.  T.  Perricone,  Software  Errors  and  Complexity:  An  Empirical  Investigation. 
Communications  of  the  ACM  27.  1,  pp.  42-52,  Jan.  1084. 

[20]  V.  R.  Basili  and  R.  W.  Selby,  Jr.,  Data  Collection  and  Analysis  in  Software  Research  and 

Management,  Proceeding »  of  the  American  Statistical  Association  and  Biometric  Society  Joint 
Statistical  Meetings.  Philadelphia.  PA.  August  13-18,  1984. 

[21]  V.  R.  Basili  and  J.  R.  Ramsey,  Structural  Coverage  of  Functional  Testing,  Dept.  Com.  Sci.,  Univ. 

Maryland.  College  Park,  Tech.  Rep.  TR-1442.  Sept.  1084. 

[22]  V.  R.  Basili  and  D.  M.  Weiss,  A  Methodology  for  Collecting  Valid  Software  Engineering  Data*, 

Tram.  Software  Engr.  SE-10,  8,  pp.  728-738.  Nov.  1984. 

[23]  V.  R.  Basili  and  C.  L.  Ramsey,  Arrowsmith-P  -  A  Prototype  Expert  System  for  Software  En¬ 

gineering  Management,  Dept.  Com.  Sci.,  Univ.  Maryland.  College  Park.  Tech.  Rep.,  1985. 
(submitted  to  the  Symposium  on  Expert  Systems  in  Government,  Mclean,  VA.  Oct.  1085) 

[24]  V.  R.  Basili  and  R.  W.  Selby,  Jr.,  Comparing  the  Effectiveness  of  Software  Testing  Strategies, 

Dept.  Com.  Sci.,  Univ.  Maryland,  College  Park,  Tech.  Rep.,  1085.  (submitted  to  the  IEEE 
Tram.  Software  Engr.) 

[25]  V.  R.  Basili  and  R.  W.  Selby;  Jr.,  Four  Applications  of  a  Software  Data  Collection  and  Analysis 

Methodology.  Proe.  NATO  Advanced  Study  Imtitute:  The  Challenge  of  Advanced  Computing 
Technology  to  System  Design  Methods,  Durham,  U.  K..  July  20  -  August  10.  1086. 

[28]  V.  R.  Basili  and  R.  W.  Selby,  Jr.,  Calculation  and  Use  of  an  Environment's  Characteristic  Soft¬ 
ware  Metric  Set,  Proe.  Eighth  Int.  Conf.  Software  Engr.,  London,  August  28-30,  1085. 

[27]  V.  R.  Basili.  E.  E.  Kats,  N.  M.  Panlillo-Yap,  C.  L.  Ramsey,  and  S.  Chang.  A  Quantitative  Char¬ 

acterisation  and  Evaluation  of  a  Software  Development  In  Ada,  IEEE  Computer,  September 
1085. 

[28]  B.  W.  Boehm,  Software  Engineering  Economies,  Prentice-Hall,  Englewood  Cllfb,  NJ,  1081. 

[20]  B.  W.  Boehm.  T.  E.  Gray,  and  T.  Seewaldt,  Prototyping  Versus  Specifying:  A  Multiproject  Ex¬ 

periment,  IEEE  Tram.  Software  Engr.  SE-10.  3,  pp.  290-303.  May  1084. 

[30]  R.  C.  Bogdan  and  S.  K.  Biklen,  Qualitative  Research  for  Education;  An  Introduction  to  Theory 

and  Methods,  Allyn  and  Bacon,  Boston,  MA.  1082. 

[31]  J.  Bowen,  Estimation  of  Residual  Faults  and  Testing  Effectiveness,  Seventh  Minnowbrook 

Workshop  on  Software  Performance  Evaluation,  Blue  Mountain  Lake,  NY,  July  24-27.  1984. 


27 


[67]  J.  Knight  and  N.  Leveson,  A  Large  Scale  Experiment  in  N-Version  Programming.  Proe.  of  the 

Ninth  Annual  Software  Engineering  Workthop,  NASA/GSFC,  Greenbeit,  MD,  Nov.  1984. 

[68]  R.  C.  Linger,  H.  D.  Mills,  and  B.  I.  Witt,  Strnctured  Programming:  Theory  and  Praetiee, 

Addison-Wesley,  Reading,  MA,  1970. 

[69]  B.  Littlewood  and  J.  L.  Verrall,  A  Bayesian  Reliability  Growth  Model  for  Computer  Software, 

Applied  Statietie*  22,  3,  1973. 

[70]  B.  Littlewood.  Stochastic  Reliability  Growth:  A  Model  for  Fault  Renovation  Computer  Programs 

and  Hardware  Designs,  IEEE  Trane.  Reliability  R-30,  4.  Oct.  1981. 

[7l|  T.  J.  McCabe.  A  Complexity  Measure,  IEEE  Trane.  Software  Engr.  SE-2,  4.  pp.  308-320,  Dec. 
1976. 

[72]  J.  A.  McCall,  P.  Richards,  and  G.  Walters,  Factors  in  Software  Quality.  Rome  Air  Development 

Center,  Grifflss  Air  Force  Base,  NY,  Tech.  Rep.  RADC-TR-77-360,  Nov.  1977. 

[73]  R.  J.  Mlara,  J.  A.  Musselman,  J.  A.  Navarro,  and  B.  Shnetderman,  Program  Indentation  and 

Comprehensibility,  Communication  of  the  ACM  26,  n,  pp.  861-867,  Ncv.  1983. 

[74]  T.  Moher  and  G.  M.  Schneider.  Methodology  and  Experimental  Research  in  Software  Engineering, 

International  Journal  of  Man-Machine  Studiee  16.  1.  pp.  65-87,  1982. 

[75]  S.  A  Mulalk,  The  Foundatione  of  Factor  Analytic,  M.-Graw-Hlll,  New  York.  1972. 

[76]  J.  D.  Musa,  A  Theory  of  Software  Reliability  and  Its  Application.  IEEE  Trane.  Software  Engr. 

SE-1.  3.  pp.  312-327,  1975. 

[77]  J.  D.  Musa,  Software  reliability  measurement.  Journal  of  Syeteme  and  Software  1,  3.  pp.  223-241, 

1980. 

[78]  G.  J.  Myers,  Compoeite/ Structured  Deeign,  Van  Nostrand  Reinhold.  1978. 

[79]  G.  J.  Myers,  A  Controlled  Experiment  in  Program  Testing  and  Code  Walkthroughs/Inspections, 

Communication  of  the  ACM,  pp.  760-768,  Sept.  1978. 

[80]  J.  Neter  and  W.  Wasserman,  Applied  Linear  Statietieal  Modele,  Richard  D.  Irwin,  Inc.,  Home- 

wood,  IL.  1974. 

[81]  T.  J.  Ostrand  and  E.  J.  Weyuker,  Collecting  and  Categorizing  Software  Error  Data  in  an  Indus¬ 

trial  Environment.  Dept.  Com.  Sci„  Courant  Inst.  Math.  Scl„  New  York  Unlv.,  NY.  Tech. 

Rep.  47,  August  1982  (Revised  May  1983). 

[82]  D.  J.  Paul,  Experience  with  Automatic  Program  Testing,  Proc.  NBS  Trende  and  Applications. 

Nat.  Bureau  Stds.,  Gaithersburg.  MD.  pp.  25-28,  May,  28  1981. 

[83]  D.  L.  Parnas,  On  the  Criteria  to  be  Used  in  Decomposing  Systems  into  Modules.  Communications 

of  the  ACM  15.  12,  pp.  1053-1068,  1972. 

[84]  D.  L.  Parnas,  Some  Conclusions  from  an  Experiment  in  Software  Engineering  Techniques,  AFIPS 

Proe.  1972  Fall  Joint  Computer  Conf.  41,  pp.  325-329,  1972. 


30 


[•40]  A.  Endres.  An  Analysis  of  Errors  and  their  Causes  In  Systems  Programs,  IEEE  Trans.  Software 
Engr.,  pp.  140-149,  June  1975. 

[50]  A.  R.  Feuer  and  E.  B.  Fowlkes,  Some  Results  from  an  Empirical  Study  of  Computer  Software, 

Proe.  Fourth  Int.  Conf  Software  Engr.,  pp.  351-355,  1979. 

[51]  R.  W.  Floyd.  Assigning  Meaning  to  Programs,  Am.  Math.  Soe.  19,  ed.  J.  T.  Schwarts,  Provi¬ 

dence,  RI,  1967. 

[52]  K.  Freburger  and  V.  R.  Basil!,  The  Software  Engineering  Laboratory:  Relationship  Equations. 

Dept.  Com.  Sci.,  Univ.  Maryland,  College  Park,  Tech.  Rep.  TR-764,  May  1979. 

[53]  J.  D.  Gannon  and  J.  J.  Horning,  The  Impact  of  Language  Design  on  the  Production  of  Reliable 

Software,  Trans.  Software  Engr.  SE-1.  pp.  179-191,  1975. 

[54]  J.  D.  Gannon,  An  Experimental  Evaluation  of  Data  Type  Conventions.  Communications  of  the 

ACM  20,  8,  pp.  584-596,  1977. 

[55]  J.  D.  Gannon,  E.  E.  Katx,  and  V.  R.  Baslll.  Characterizing  Ada  Programs:  Packages.  The  Meas¬ 

urement  of  Computer  Software  Performance,  Los  Alamos  National  Laboratory.  Aug.  1983. 

[56]  A.  L.  Goel,  Software  Reliability  and  Ei?' mat  ton  Techniques,  Rome  Air  Development  Center, 

Grifllss  Air  Force  Base,  NY,  Rep.  RADC-TR-82-263,  October  1982. 

[57]  J.  D.  Gould  and  P.  Drongowskl.  An  Exploratory  Study  of  Computer  Program  Debugging.  Human 

Factors  16,  3,  pp.  258-277,  1974. 

[58]  J.  D.  Gould,  Some  Psychological  Evidence  on  How  People  Debug  Computer  Programs,  Interna¬ 

tional  Journal  of  Man-Machine  Studies  7.  pp.  151-182,  1975. 

[59]  M.  H.  Halstead,  Elements  of  Software  Science,  North  Holland,  New  York,  1977. 

[60]  W.  c.  Hetxel.  An  Expermental  Analysis  of  Program  Verification  Methods.  Ph.D.  Thesis.  Univ.  of 

North  Carolina,  Chapel  Hill,  1976. 

[61]  C.  A  R-  Hoare,  An  Axiomatic  Basis  for  Computer  Programming,  Communications  of  the  ACM 

12.  10,  pp.  576-683.  Oct.  1969. 

[62]  D.  H.  Hutchens  and  V.  R.  Baslll,  System  Structure  Analysis:  Clustering  With  Data  Bindings, 

IEEE  Trans.  Soft.  Engr.  SE-11,  8,  Aug.  1985. 

[63]  S-3.  V.  Hwang,  An  Empirical  Study  In  Functional  Testing,  Structural  Testing,  and  Code 

Reading/Inspection*,  Dept.  Com.  Scl.,  Univ.  of  Maryland,  College  Park,  Scholarly  Paper  362. 
Dec.  1981. 

\ 

[64]  Z.  Jellnakl  and  P.  B.  Moranda,  Applications  of  a  Probability-Based  Model  to  a  Code  Reading  Ex¬ 

periment,  Proe.  IEEE  Symposium  on  Computer  Software  Reliability,  New  York.  pp.  78-81. 
IEEE.  1973. 

[65]  W.  L.  Johnson,  S.  Draper,  and  E.  Soloway,  An  Effective  Bug  Classification  Scheme  Must  Take  the 

Programmer  Into  Account,  Proe.  Workshop  High-Level  Debugging,  Palo  Alto,  CA.  1983. 

[66]  J.  P.  J.  Kelly,  Specification  of  Fault-Tolerant  Multi-Version  Software:  Experimental  Studies  of  a 

Design  Diversity  Approach.  UCLA  Ph.D.  Thesis.  1982. 


29 


[102]  W.  P.  Stevens,  G.  J.  Myers,  and  L.  L.  Constantine.  Structural  Design,  IBM  Systems  Journal  13, 

2.  pp.  115*130,  1074. 

[103]  L.  G.  Stuckl,  New  Directions  in  Automated  Tools  for  Improving  Software  Quality,  in  Current 

Trend*  »n  Programming  Methodology,  ed.  R.  T.  Yeh,  Prentice  Hall,  Englewood  Cliffs.  NJ. 
1077. 

[104]  I.  Vessey  and  R.  Weber,  Some  Factors  Affecting  Program  Repair  Maintenance:  An  Empirical 

Study.  Communication *  of  the  ACM  20,  2,  pp.  125-134,  Feb.  1083. 

[105]  J.  Vosburgh,  B.  Curtis,  R.  Wolverton,  B.  Albert,  H,  Malec,  S.  Hoben,  and  Y.  Liu,  Productivity 

Factors  and  Programming  Environments,  Proe.  Seventh  Int.  Conf.  Software  Engr.,  Orlando. 
FL.  pp.  143-152,  1084. 

[106]  C.  E.  Walston  and  C.  P.  Felix,  A  Method  of  Programming  Measurement  and  Estimation,  IBM 

System*  J.  10,  1,  pp.  54-73.  1077. 

[107]  G.  Weinberg,  The  Psychology  of  Computer  Programming,  Van  Nostrand  Rheinhold  Co.,  1971. 

[108]  M.  Weiser,  Programmers  Use  Slices  When  Debugging,  Communications  ACM  25,  pp.  446-452, 

July  1082. 

[100]  D.  M.  Weiss  and  V.  R.  BasUl.  Evaluating  Software  Development  by  Analysts  of  Changes:  Some 
Data  from  the  Software  Engineering  Laboratory,  IEEE  Tran*.  Software  Engr.  SE-11,  2.  pp. 
157-168.  February  1085. 

[110]  L.  Weiss  man.  Psychological  Complexity  of  Computer  Programs:  An  Experimental  Methodology, 

SIGPLAN  Notice*  0.  6,  pp.  25  -  36,  June  1074. 

[111]  R.  Wolverton,  The  Cost  of  Developing  Large  Scale  Software.  IEEE  Tran*.  Computer*  23.  6.  1974. 

[112]  S.  N.  Woodfleld,  H.  E.  Dunsmore,  and  V.  Y.  Shen,  The  Effect  of  Modularization  and  Comments 

on  Program  Comprehension,  Dept.  Com.  Sci.,  Arizona  St.  Unlv.,  Tempe,  A Z,  working  paper, 
1081. 

[113]  J.  C.  Zolnowsltl  and  D.  B.  Simmons,  Taking  the  Measure  of  Program  Complexity,  Proe.  National 

Computer  Conference,  pp.  320-336,  1081. 


32 


[85]  D.  L.  Pirn  as,  A  Technique  for  Module  Specification  With  Examples,  Communications  of  the  ACM 

15,  May  1972. 

[86]  L.  Putnam,  A  General  Empirical  Solution  to  the  Macro  Software  Sizing  and  Estimating  Problem. 

IEEE  Trane.  Software  Engr.  4,  4.  1978. 

[87]  H.R.  Ramsey,  M.E.  Atwood,  and  J.R.  Van  Doren,  Flowcharts  Versus  Program  Design  Languages: 

An  Experimental  Comparison,  Communications  ACM  2ft,  6,  pp.  445-449,  June  1983. 

[88]  J.  Ramsey.  Structural  Coverage  of  Functional  Testing,  Seventh  Minnowbrook  Workshop  on  Soft¬ 

ware  Performance  Evaluation,  Blue  Mountain  Lake,  NY,  July  24-27,  1984. 

[89]  Statistical  Analysis  System  (SAS)  User’s  Guide.  SAS  Institute  Inc.,  Box  8000,  Cary,  NC.  27511. 

1982. 

[90]  H.  Scheffe,  The  Analysis  of  Variance,  John  Wiley  &  Sons,  New  York,  1959. 

[91]  Annotated  Bibliography  of  Software  Engineering  Laboratory  (SEL)  Literature,  Software  Eng. 

Lab.,  NASA/Goddard  Space  Flight  Center,  Greenbelt,  MD  Rep.  SEL-82-006.  Nov.  ig82. 

[92]  R.  W.  Selby.  Jr..  An  Empirical  Study  Comparing  Software  Testing  Techniques.  Sixth  Min- 

now  brook  Workshop  on  Software  Performance  Evaluation,  Blue  Mountain  Lake,  NY.  July  19- 
22.  1983. 

[93]  R.  W.  Selby,  Jr.,  Evaluations  of  Software  Technologies:  Testing,  CLEANROOM,  and  Metrics, 

Dept.  Com.  Scl.,  Univ.  Maryland,  College  Park.  Ph.  D.  Dissertation,  1985. 

194]  R.  W.  Selby.  Jr.,  V.  R.  Baslll,  and  F.  T.  Baker,  CLEANROOM  Software  Development:  An  Empir¬ 
ical  Evaluation,  Dept.  Com.  Scl.,  Univ.  Maryland.  College  Park,  Tech.  Rep.  TR-1415.  Febru¬ 
ary  1985.  (submitted  to  the  IEEE  Trans.  Software  Engr.) 

[95]  J.  G.  Shanthlkumar.  A  Statistical  Time  Dependent  Error  Occurrence  Rate  Software  Reliability 

Model  with  Imperfect  Debugging,  Proc.  19S1  National  Computer  Conference,  June  1981. 

[96]  B.  A.  Shell,  The  Psychological  Study  of  Programming,  Computing  Surveys  13.  pp.  101-120,  March 

1981. 

[97]  V.Y.  Shen,  T.J.  Yu,  SM.  Thebaut,  and  L.R.  Paulsen,  Identifying  Error-Prone  Software  -  An  Em¬ 

pirical  Study,  IEEE  Trans.  Soft.  Engr.  SE-11,  4,  pp.  317-324,  April  1985. 

[98]  B.  Shnelderman.  R.  E.  Mayer,  D.  McKay,  and  P.  Heller.  Experimental  Investigations  of  the  Utili¬ 

ty  of  Detailed  Flowcharts  In  Programming,  Communications  of  the  ACM  20,  6,  pp.  373-381. 
1977. 

[99]  S.  Siegel,  Nonparametrte  Statistics  for  the  Behavioral  Sciences,  McGraw-Hill,  New  York.  1955. 

[100]  E.  Soloway,  K.  Ehrlich,  J.  Bonar,  and  J.  Greenspan.  What  Do  Novices  Know  About  Program¬ 

ming?,  In  Directions  in  Human- Computer  Interactions,  ed.  A.  Bad  re  and  B.  Shnelderman. 
Ablex,  Inc.,  1982. 

[101]  E.  Soloway  and  K.  Ehrlich,  Empirical  Studies  of  Programming  Knowledge.  Trans.  Software  Engr. 

SE-10,  5.  pp.  595-609,  Sept.  1984. 


31 


Flgy'^^^SimuTiMjyjfjhe^amework^i^ygsemnentatlor^ 


I.  Definition 

Motivation 

Object 

Purpose 

Perspective 

Domain 

Scope 

Understand 

Assess 

Manage 

Engineer 

Learn 

Improve 

Validate 

.Assure 

Product 

Process 

Model 

Metric 

Theory 

Characterize 

Evaluate 

Predict 

Motivate 

Developer 

Modifier 

Maintainer 

Project  manager 
Corporate  manager 
Customer 

User 

Researcher 

Programmer 

Program/project 

Single  project 
Multi-project 

Replicated  project 
Blocked  subject-project 

II.  Planning 

Design 

Criteria 

Measurement 

Experimental  designs 
Incomplete  block 
Completely  randomized 
Randomized  block 
Fractional  factorial 
Multivariate  analysis 
Correlation 

Factor  analysis 

Regression 

Statistical  models 
Mon-parametric 

Sampling 

Direct  reflections  of  cost/ quality 

Cost 

Errors 

Changes 

Reliability 

Correctness 

Indirect  reflections  of  coat/quaiity 
Data  coupling 

Information  visibility 

Programmer  comprehension 
Execution  coverage 

Size 

Complexity 

Metric  definition 

Goal-question-metric 

Factor-criteria-metric 

Metric  validation 

Data  collection 

AutomatabIMty 

Form  design  and  test 

Objective  vs.  subjective 

Level  of  measurement 
Nominal/classiflcatory 

Ordinal/ranking 

Interval 

Ratio 

m.  Operation 

Preparation 

Execution 

Analysis 

Pilot  study 

Data  collection 

Data  validation 

Quantitative  vs.  qualitative 

Preliminary  data  analysis 

Plots  and  histograms 

Model  assumptions 

Primary  data  analysis 

Model  aopllcstlon 

IV.  Interpretation 

Interpretation  context 

Extrapolation 

Impact 

Statistical  framework 

Study  purpose 

Field  of  research 

Sample  representativeness 

Visibility 

Replication 

Application 

